class: title-slide <br> <br> # A Data Mining Approach for Detecting Collusion in Unproctored Online Exams<br> .padding_left.pull-down.white[ J. Langerbein, T. Massing, .bold[_J. Klenke_], M. Striewe, M. Goedicke, C. Hanck, N. Reckmann <br> <br> <br> `\(15^{th}\)` International Conference on Educational Data Mining Bangalore, 11-14 July, 2023 ] --- <h1> Outline </h1> `\(\quad\)` 1. [Introduction](#introduction) 1. [Related work](#related_work) 1. [Methodology](#methodology) 1. [Empirical Results](#empirical_results) 1. [Discussion](#discussion) 1. [References](#references) --- name: introduction # Introduction * COVID-19 forced universities to switch to online classes and exams * Proctoring online exams with video conference software was often prohibited due to data protection regulations and economically unfeasible * In this case study take-home exams were conducted as open-book, but collaboration was strictly prohibited * Hierarchical clustering algorithms were used to identify groups of potentially colluding students * The method successfully found groups with nearly identical exams * A proctored comparison group helped categorize student groups as "outstandingly similar" --- name: related_work <h1> Related work </h1> * Limited research exists on unproctored exams at universities prior to the pandemic * <a href='#bib-cleophas2021s'>Cleophas et al. (2021)</a> propose a method using event logs to detect collusion in unproctored exams * Previous studies focused on similarity measures for programming exams based on keyboard patterns, e.g. <a href='#bib-Hellas_2017'>Hellas et al. (2017)</a> and <a href='#bib-Leinonen_2016'>Leinonen et al. (2016)</a> * Other literature (e.g. <a href='#bib-hemming2010online'>Hemming (2010)</a>) relies on surveys or interviews, lacking actual student behavior data on collusion * Some studies suggest that unsupervised online exams may lead to collusion * <a href='#bib-hollister2009proctored'>Hollister and Berenson (2009)</a> used GPA and final exam scores to analyze collusion but not data collected during the exam .blockquote[ * Our goal is to use this method for statistical courses and strengthen the analysis with a comparison group ] --- name: methodology <h1> Methodology — <span style="font-size: 0.8em;"> Model </span> </h1> * Data for the study was collected from the *Descriptive Statistics* course at the University Duisburg-Essen, Germany * The exams consisted of arithmetical problems, programming tasks in the statistical programming language `R`, and a short essay task * Both exams were conducted digital with the e-assement system [JACK](https://www.uni-due.de/zim/services/jack.php) - Event logs captured students' activities and time stamps during the exams, and points achieved per task were recorded * The test group took the unproctored exam at home during the COVID-19 pandemic, while the comparison group took a proctored exam in the facilities of the university * Data cleaning was conducted, removing students with minimal participation or achievement * The difference between the two courses is marginal; the course content and objectives, as well as the course structure, have not changed notably over time --- <h1> Methodology — <span style="font-size: 0.8em;"> Data set </span> </h1> <br> <table> <thead> <tr> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; padding-right: 4px; padding-left: 4px; background-color: #ffffff !important;" colspan="1"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">$$$$</div></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; padding-right: 4px; padding-left: 4px; background-color: #ffffff !important;" colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Groups</div></th> </tr> <tr> <th style="text-align:left;font-weight: bold;background-color: #ffffff !important;"> </th> <th style="text-align:center;font-weight: bold;background-color: #ffffff !important;"> Comparison </th> <th style="text-align:center;font-weight: bold;background-color: #ffffff !important;"> Test </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;min-width: 10em; font-weight: bold;"> Year </td> <td style="text-align:center;"> 18/19 </td> <td style="text-align:center;"> 20/21 </td> </tr> <tr> <td style="text-align:left;min-width: 10em; font-weight: bold;"> N </td> <td style="text-align:center;"> 109 </td> <td style="text-align:center;"> 151 </td> </tr> <tr> <td style="text-align:left;min-width: 10em; font-weight: bold;"> Style </td> <td style="text-align:center;"> proctocred </td> <td style="text-align:center;"> unprocotored </td> </tr> <tr> <td style="text-align:left;min-width: 10em; font-weight: bold;"> Total points </td> <td style="text-align:center;"> 60 </td> <td style="text-align:center;"> 60 </td> </tr> <tr> <td style="text-align:left;min-width: 10em; font-weight: bold;"> Sub tasks </td> <td style="text-align:center;"> 19 </td> <td style="text-align:center;"> 17 </td> </tr> <tr> <td style="text-align:left;min-width: 10em; font-weight: bold;"> Duration </td> <td style="text-align:center;"> 60 + 10 </td> <td style="text-align:center;"> 60 + 10 </td> </tr> </tbody> </table> --- <h1> Methodology — <span style="font-size: 0.8em;"> Model </span> </h1> * Agglomerative (bottom-up) hierarchical clustering algorithm * Global pairwise dissimilarities `\(D(x_i, x_{i'})\)` `$$D(x_i, x_{i'}) = \frac{1}{h} \sum_{j=1}^h w_j \cdot d_j(x_{ij}, x_{i'j}) \quad with \quad \sum_{j=1}^h w_j = 1$$` * With * `\(d_j(x_{ij}, x_{i'j})\)` pairwise attribute dissimilarity * `\(i = 1, ..., N\)` students * `\(j = 1, ..., h\)` attributes * We compared two different kinds of attributes * Dissimilarities in the student´s event patters (time of submission) * Dissimilarities in points achieved --- <h1> Methodology — <span style="font-size: 0.8em;"> Model </span> </h1> <br> Dissimilarities in the student´s event patters (time of submission) for each task `\(j\)` * `\(d_j^L(v_{ij}, v_{i'j})\)` with weights `\(w_j^L\)` * We divided the examination into `\(m = 1, ... , 70\)` intervals * `\(v_{ijm}\)` denotes the count of answers of student `\(i\)` for task `\(j\)` in the `\(m\)`-th interval * Manhatten metric used for calculation of the pairwise attribute dissimilarity `\(\quad\)` `$$d_j^L(v_{ij}, v_{i'j}) = \sum_{m=1}^{K=70} | v_{ijm} - v_{i'jm} |$$` --- <h1> Methodology — <span style="font-size: 0.8em;"> Model </span> </h1> <br> Dissimilarities in points achieved for each task `\(j\)` * `\(d_j^P(s_{ij}, s_{i'j})\)` with weights `\(w_j^P\)` * `\(s_{ij}\)` denotes the points achieved by student `\(i\)` in the `\(j\)`-th sub task * Absolute difference used as dissimilarity measure `\(\quad\)` `$$d_j^P(s_{ij}, s_{i'j}) = | s_{ij} - s_{i'j} |$$` --- <h1> Methodology — <span style="font-size: 0.8em;"> Model </span> </h1> ### Full model `$$D(s_i, s_{i'}, v_i, v_{i'}) = \dfrac{1}{h} \sum_{j=1}^h \left(w_j^P \cdot d_j^P (s_{ij}, s_{i'j}) + w_j^L \cdot d_j^L (v_{ij}, v_{i'j}) \right) \quad \text{with} \quad \sum_{j=1}^h w_j^P + w_j^L =1$$` * Weights `\(w_j\)` control the influence of each attribute on the global object dissimilarity * We reduced the weights for * `R`-tasks and free-text questions, since the event log might not be comparable in these cases * Points achieved * Since dissimilarity measures depend on scale, the attributes were normalized --- <h2>Empirical results — <span style="font-size: 0.8em;">Dendogram</span></h2> .panelset.sideways[ .panel[.panel-name[Control group] <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../resources/graphics/dendogram_control.png" alt="<strong> Figure 1: </strong> Dendogram produced by average linkage clustering of the proctored control group (2018/19). <strong> G-L </strong> mark the clusters with the lowest dissimilarity" width="150%" /> <p class="caption"><strong> Figure 1: </strong> Dendogram produced by average linkage clustering of the proctored control group (2018/19). <strong> G-L </strong> mark the clusters with the lowest dissimilarity</p> </div> ] .panel[.panel-name[Test group] <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../resources/graphics/dendogram_test.png" alt="<strong>Figure 2:</strong> Dendogram produced by average linkage clustering of the unproctored test group (2020/21). <strong> A-F </strong> mark the clusters with the lowest dissimilarity." width="100%" /> <p class="caption"><strong>Figure 2:</strong> Dendogram produced by average linkage clustering of the unproctored test group (2020/21). <strong> A-F </strong> mark the clusters with the lowest dissimilarity.</p> </div> ] ] -- <h3 style="margin-bottom: -15px;">Results</h3> <p style="margin-top: 0; font-size: 70%;"> <ul> <li>The control group has an overall higher level of dissimilarity and doesn´t contain any strikingly similar cluster. The six lowest cluster from the test group stand out in terms of similarity, specially cluster <strong>A</strong>, <strong>B</strong> and <strong>E</strong>.</li> </ul> </p> --- <h2>Empirical results — <span style="font-size: 0.8em;">Distribution of measured distances </span> </h2> <br> <br> .pull-left[ <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../resources/graphics/boxplot_original.png" alt="<strong>Figure 3.1:</strong> Comparison of the non-normalised distance measures." width="100%" height="60%" /> <p class="caption"><strong>Figure 3.1:</strong> Comparison of the non-normalised distance measures.</p> </div> ] .pull-right[ <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../resources/graphics/boxplot_norm.png" alt="<strong>Figure 3.2:</strong> Comparison of the normalised distance measures." width="100%" height="60%" /> <p class="caption"><strong>Figure 3.2:</strong> Comparison of the normalised distance measures.</p> </div> ] --- <h2>Empirical results — <span style="font-size: 0.8em;">Cluster comparison</span></h2> .panelset[ .panel[.panel-name[AB] <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../resources/graphics/plot_ab.png" alt="<strong>Figure 4.1:</strong> Comparison of the event logs and achieved points of the clusters <strong>A</strong> and <strong>B</strong> from the test group (2020/21). Above the scatter plot, a bar chart is added to compare the points per subtask." width="80%" /> <p class="caption"><strong>Figure 4.1:</strong> Comparison of the event logs and achieved points of the clusters <strong>A</strong> and <strong>B</strong> from the test group (2020/21). Above the scatter plot, a bar chart is added to compare the points per subtask.</p> </div> ] .panel[.panel-name[CD] <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../resources/graphics/plot_cd.png" alt="<strong>Figure 4.2:</strong> Comparison of the event logs and achieved points of the clusters <strong>C</strong> and <strong>D</strong> from the test group (2020/21). Above the scatter plot, a bar chart is added to compare the points per subtask." width="80%" /> <p class="caption"><strong>Figure 4.2:</strong> Comparison of the event logs and achieved points of the clusters <strong>C</strong> and <strong>D</strong> from the test group (2020/21). Above the scatter plot, a bar chart is added to compare the points per subtask.</p> </div> ] .panel[.panel-name[EF] <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../resources/graphics/plot_ef.png" alt="<strong>Figure 4.3:</strong> Comparison of the event logs and achieved points of the clusters <strong>E</strong> and <strong>F</strong> from the test group (2020/21). Above the scatter plot, a bar chart is added to compare the points per subtask." width="80%" /> <p class="caption"><strong>Figure 4.3:</strong> Comparison of the event logs and achieved points of the clusters <strong>E</strong> and <strong>F</strong> from the test group (2020/21). Above the scatter plot, a bar chart is added to compare the points per subtask.</p> </div> ] .panel[.panel-name[GH] <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../resources/graphics/plot_gh.png" alt="<strong>Figure 4.4:</strong> Comparison of the event logs and achieved points of the clusters <strong>G</strong> and <strong>H</strong> from the control group (2018/19). Above the scatter plot, a bar chart is added to compare the points per subtask." width="80%" /> <p class="caption"><strong>Figure 4.4:</strong> Comparison of the event logs and achieved points of the clusters <strong>G</strong> and <strong>H</strong> from the control group (2018/19). Above the scatter plot, a bar chart is added to compare the points per subtask.</p> </div> ] .panel[.panel-name[IJ] <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../resources/graphics/plot_ij.png" alt="<strong>Figure 4.5:</strong> Comparison of the event logs and achieved points of the clusters <strong>I</strong> and <strong>J</strong> from the control group (2018/19). Above the scatter plot, a bar chart is added to compare the points per subtask." width="80%" /> <p class="caption"><strong>Figure 4.5:</strong> Comparison of the event logs and achieved points of the clusters <strong>I</strong> and <strong>J</strong> from the control group (2018/19). Above the scatter plot, a bar chart is added to compare the points per subtask.</p> </div> ] .panel[.panel-name[KL] <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../resources/graphics/plot_kl.png" alt="<strong>Figure 4.6:</strong> Comparison of the event logs and achieved points of the clusters <strong>K</strong> and <strong>L</strong> from the control group (2018/19). Above the scatter plot, a bar chart is added to compare the points per subtask." width="80%" /> <p class="caption"><strong>Figure 4.6:</strong> Comparison of the event logs and achieved points of the clusters <strong>K</strong> and <strong>L</strong> from the control group (2018/19). Above the scatter plot, a bar chart is added to compare the points per subtask.</p> </div> ] ] --- <h1> Discussion </h1> * The results of hierarchical clustering algorithms are presented in a dendrogram, providing a visual representation of the clustering results. * A dendrogram resembles a tree structure, where objects are merged based on their dissimilarity in a bottom-up approach. * Various hierarchical clustering algorithms exist, and the cophenetic correlation coefficient is used to assess how well each algorithm represents the original structure in the data. * Average linkage clustering is deemed the most suitable algorithm for the analysis. * The dendrogram shows compact clusters at medium dissimilarities, with three notable clusters (**A**, **B**, and **E**) consisting of two students each, indicating the absence of collusion in larger groups. * Scatterplots and barcharts are used to examine the similarity of students' chronology and achieved points within clusters. --- name: discussion <h1> Discussion </h1> * Comparison with the results from the comparison group supports the findings, indicating that collusion over the entire exam is unlikely, and the differences between the groups are not coincidental. * The method successfully detects at least three clusters with near identical exams. * The approach provides a basis for further examination of clusters based on comparison with a reference group, but the ground truth is not known, limiting the certainty of conclusions. * Nevertheless, the elevated risk of detection may indeed discourage students from cheating in unproctored exams. * This is not only a important step in adapting to the progressing digitization of education, but it also equips us better for unforeseen situations in the future, much like the COVID-19 pandemic. --- <h1> Further research </h1> * Exploring the long-term effectiveness of the detection method in deterring students from colluding in exams, and its impact on academic integrity and student behavior. * Development and implementation of methods to collect and analyze complementary evidence, with the aim of improving detection rates and understanding the extent of collusion among students. --- name: references # References .font80[ Cleophas, C., C. Hoennige, F. Meisel, and P. Meyer (2021). "Who's Cheating? Mining Patterns of Collusion from Text and Events in Online Exams". In: _Mining Patterns of Collusion from Text and Events in Online Exams (April 12, 2021)_. Hellas, A., J. Leinonen, and P. Ihantola (2017). _Plagiarism in Take-Home Exams: Help-Seeking, Collaboration, and Systematic Cheating_. ITiCSE '17. Bologna, Italy: Association for Computing Machinery, p. 238–243. ISBN: 9781450347044. DOI: 10.1145/3059009.3059065. <https://doi.org/10.1145/3059009.3059065>. Hemming, A. (2010). "Online tests and exams: lower standards or improved learning?" In: _The Law Teacher_ 44.3, pp. 283-308. Hollister, K. K. and M. L. Berenson (2009). "Proctored versus unproctored online exams: Studying the impact of exam environment on student performance". In: _Decision Sciences Journal of Innovative Education_ 7.1, pp. 271-294. Leinonen, J., K. Longi, A. Klami, A. Ahadi, and A. Vihavainen (2016). _Typing patterns and authentication in practical programming exams_ , pp. 160-165. ]